Algorithms for Efficient Top-Down Join Enumeration

نویسنده

  • Pit Fender
چکیده

For a DBMS that provides support for a declarative query language like SQL, the query optimizer is a crucial piece of software. The declarative nature of a query allows it to be translated into many equivalent evaluation plans. The process of choosing a suitable plan from all alternatives is known as query optimization. The basis of this choice are a cost model and statistics over the data. Essential for the costs of a plan is the execution order of join operations in its operator tree, since the runtime of plans with different join orders can vary by several orders of magnitude. An exhaustive search for an optimal solution over all possible operator trees is computationally infeasible. To decrease complexity, the search space must be restricted. Therefore, a well-accepted heuristic is applied: All possible bushy join trees are considered, while cross products are excluded from the search. There are two efficient approaches to identify the best plan: bottom-up and topdown join enumeration. But only the top-down approach allows for branch-and-bound pruning, which can improve compile time by several orders of magnitude, while still preserving optimality. Hence, this thesis focuses on the top-down join enumeration. In the first part, we present two efficient graph-partitioning algorithms suitable for top-down join enumeration. However, as we will see, there are two severe limitations: The proposed algorithms can handle only (1) simple (binary) join predicates and (2) inner joins. Therefore, the second part adopts one of the proposed partitioning strategies to overcome those limitations. Furthermore, we propose a more generic partitioning framework that enables every graph-partitioning algorithm to handle join predicates involving more than two relations, and outer joins as well as other non-inner joins. As we will see, our framework is more efficient than the adopted graph-partitioning algorithm. The third part of this thesis discusses the two branch-and-bound pruning strategies that can be found in the literature. We present seven advancements to the combined strategy that improve pruning (1) in terms of effectiveness, (2) in terms of robustness and (3), most importantly, avoid the worst-case behavior otherwise observed. Different experiments evaluate the performance improvements of our proposed methods. We use the TPC-H, TPC-DS and SQLite test suite benchmarks to evaluate our joined contributions. As we show, the average compile time improvement in those settings is 100% when compared with the state of the art in bottom-up join enumeration. Our synthetic workloads show even higher improvement factors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs

Finding the optimal execution order of join operations is a crucial task of today’s cost-based query optimizers. There are two approaches to identify the best plan: bottom-up and top-down join enumeration. But only the top-down approach allows for branchand-bound pruning, which can improve compile time by several orders of magnitude while still preserving optimality. For both optimization strat...

متن کامل

Optimization Strategy of Top-Down Join Enumeration on Modern Multi-Core CPUs

Most contemporary database systems query optimizers exploit System-R’s bottom-up dynamic programming method (DP) to find the optimal query execution plan (QEP) without evaluating redundant subplans. The distinguished exceptions are Volcano/Cascades using transforms to generate new plans according to a topdown approach. As recent research has revealed, bottom-up dynamic programming can improve p...

متن کامل

Optimizing Join Enumeration in Transformation-based Query Optimizers

Query optimizers built on the Volcano/Cascades framework, which is based on transformation rules, are used in many commercial databases. Transformation rulesets proposed earlier for join order enumeration in such a framework either allow enumeration of joins with cross-products (which can significantly increase the cost of optimization), or generate a large number of duplicate derivations. In t...

متن کامل

Graceful Degradation for Top-Down Join Enumeration via similar sub-queries measure on Chip Multi-Processor

Most contemporary database systems query optimizers exploit System-R’s dynamic programming method (DP) to find the optimal query execution plan (QEP) without evaluating redundant sub-plans. However, in the relational database setting today, large queries containing many joins are becoming increasingly common. Based on this trend, it has become temping to improve the DP performance. Chip Multi-P...

متن کامل

Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach

Data sets of very high dimensionality, such as microarray data, pose great challenges on efficient processing to most existing data mining algorithms. Recently, there comes a row-enumeration method that performs a bottom-up search of row combination space to find corresponding frequent patterns. Due to a limited number of rows in microarray data, this method is more efficient than column enumer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014